Spectral Unsupervised Parsing with Additive Tree Metrics

نویسندگان

  • Ankur P. Parikh
  • Shay B. Cohen
  • Eric P. Xing
چکیده

We propose a spectral approach for unsupervised constituent parsing that comes with theoretical guarantees on latent structure recovery. Our approach is grammarless – we directly learn the bracketing structure of a given sentence without using a grammar model. The main algorithm is based on lifting the concept of additive tree metrics for structure learning of latent trees in the phylogenetic and machine learning communities to the case where the tree structure varies across examples. Although finding the “minimal” latent tree is NP-hard in general, for the case of projective trees we find that it can be found using bilexical parsing algorithms. Empirically, our algorithm performs favorably compared to the constituent context model of Klein and Manning (2002) without the need for careful initialization.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Material for : Spectral Unsupervised Parsing with Additive Tree Metrics

] The primary purpose of the supplemental is to provide the theoretical arguments that our algorithm is correct. We first give the proof that our proposed tree metric is indeed tree additive. We then analyze the consistency of Algorithm 1. 1 Path Additivity We first prove that our proposed tree metric is path additive based on the proof technique in Song et al. (2011). Lemma 1. If Assumption 1 ...

متن کامل

Identifiability and Unmixing of Latent Parse Trees

This paper explores unsupervised learning of parsing models along two directions.First, which models are identifiable from infinite data? We use a general tech-nique for numerically checking identifiability based on the rank of a Jacobian ma-trix, and apply it to several standard constituency and dependency parsing models.Second, for identifiable models, how do we estimate the p...

متن کامل

Evaluating Unsupervised Part-of-Speech Tagging for Grammar Induction

This paper explores the relationship between various measures of unsupervised part-of-speech tag induction and the performance of both supervised and unsupervised parsing models trained on induced tags. We find that no standard tagging metrics correlate well with unsupervised parsing performance, and several metrics grounded in information theory have no strong relationship with even supervised...

متن کامل

Spectral Probabilistic Modeling and Applications to Natural Language Processing

Probabilistic modeling with latent variables is a powerful paradigm that has led to key advances in many applications such natural language processing, text mining, and computational biology. Unfortunately, while introducing latent variables substantially increases representation power, learning and modeling can become considerably more complicated. Most existing solutions largely ignore non-id...

متن کامل

An All-Subtrees Approach to Unsupervised Parsing

We investigate generalizations of the allsubtrees "DOP" approach to unsupervised parsing. Unsupervised DOP models assign all possible binary trees to a set of sentences and next use (a large random subset of) all subtrees from these binary trees to compute the most probable parse trees. We will test both a relative frequency estimator for unsupervised DOP and a maximum likelihood estimator whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014